The NFL has a vast amount of data on teams, games, and players, and it is all freely available on their website. The problem is, this is not a usable format for data scientists to work with. If you Google "NFL database" you are limited to websites that let you search for specific statistics, or paid full datasets. NFL data is big business.
There is a good article that is essentially the problem definition here: https://codeandfootball.wordpress.com/2011/02/15/so-where-can-i-find-free-nfl-data-sets/
And it appears that since that was written in 2011, there still hasn't been much headway in addressing the problem. So this project aims to fill that void in a modest, yet useful way. Screen-scraping is still at the core of the solution, which is really the core of the problem as well. If the data were readily available in any other format, this would be a piece of cake. But unfortunately, preprocessing the data is about 75% of the work.
Some observations:
1.) There are a finite and manageable number of teams (32, currently), so putting these in by hand was trivial.
2.) The NFL website makes it rather easy for automated screen scraping, particularly for player data. The URIs have the following format: www.nfl.com/player/{full_player_name}/{nfl_player_id}/profile?season={season_year}. One caveat to this though, is that if a player stopped playing in previous years, each year after that will list his last year's stats, whereas if you select a year before they were in the NFL, it simply says there are no stats available. A nice gotcha.
3.) NFL Player IDs are available wherever there is a link to the player. A good starting point is the Stats by Position page: http://www.nfl.com/players/search?category=position. I only went with current players, but for future expansion, we could look at historical players too.
4.) Games are available dating back to 1970 on the Schedules page: http://www.nfl.com/schedules
For the most part, this is a pretty straightforward exercise. Scrape the data, munge it into a TSV, pull the TSV into a database. There's always a catch or two. Real life isn't as structured as we would like it to be. Teams change cities. The L.A. Rams become the St. Louis Rams, the Houston Oilers become the Tennessee Oilers and then the Tennessee Titans, and the L.A. Raiders become (once again), the Oakland Raiders. Man, why can't Los Angeles keep a football team?
Additionally, players can change position too. Take Terrelle Pryor. He is listed on the NFL website as a WR, yet, he saw most of his action as a QB with the Raiders. But instead of putting in the passing data along with his WR stats, they just list the WR stats. So strictly using the NFL website, that data is not available. ESPN actually does a better job and lists all stats (even defensive in case that weird situation were to arise). In hindsite, perhaps EPSN would have been a better source.
This is meant to be a starting block. A jumping off point. I would like to build on this and create an all-encompassing NFL database, free for anyone to use. Additionally, this was an excuse for me to learn new technologies. So please excuse my Python, it's my first attempt. My code isn't normally this sloppy, but there was a lot of experimentation going on.
Most of the munging is done in plain ol' Bash using a lot of sed and grep. I recently discovered that you can execute Bash scripts directly from Jupyter, but I haven't tried it. So instead, I just listed my scripts here. And, probably a no-no as far as these projects go, I didn't list the little one-liners I used, so that's going to make reproducing this a little harder. However, my goal is to completely automate this so I can schedule it to run each Monday morning after the games have played for the week and update the database so we can have up-to-the-week data for our analyses and applications.
As long as football is a sport, I think this will be a work in progress. Besides weekly updates during football season, future enhancements include:
If you wish to contribute, or have any feature requests please let me know, and if you find this useful, star it on Github or let me know on Twitter (@detweiler).
As mentioned, I am currently only grabbing offensive player data dating back to 2009.
We begin at the player stats page, located here: http://www.nfl.com/stats/categorystats?tabSeq=1&statisticPositionCategory=QUARTERBACK&season=2015&seasonType=REG
I grab each page with wget. For example,
wget 'http://www.nfl.com/stats/categorystats?archive=true&conference=null&statisticPositionCategory=QUARTERBACK&season=2009&seasonType=REG&experience=&tabSeq=1&qualified=false&Submit=Go' -O QB_2009_1.txt wget 'http://www.nfl.com/stats/categorystats?tabSeq=1&season=2009&seasonType=REG&experience=&Submit=Go&archive=true&d-447263-p=2&conference=null&statisticPositionCategory=QUARTERBACK&qualified=false' -O QB_2009_2.txt
Now that we've got all these index pages, we have links to all the players
cat QB_20* | grep profile?id | sed -e 's/^[ \t]//g; s/a href="/www.nfl.com/; s/".//g; s/<//g' | sort | uniq > qb_all.html cat RB_20* | grep profile?id | sed -e 's/^[ \t]//g; s/a href="/www.nfl.com/; s/".//g; s/<//g' | sort | uniq > rb_all.html cat TE_20* | grep profile?id | sed -e 's/^[ \t]//g; s/a href="/www.nfl.com/; s/".//g; s/<//g' | sort | uniq > te_all.html cat WR_20* | grep profile?id | sed -e 's/^[ \t]//g; s/a href="/www.nfl.com/; s/".//g; s/<//g' | sort | uniq > wr_all.html cat K_20* | grep profile?id | sed -e 's/^[ \t]//g; s/a href="/www.nfl.com/; s/".//g; s/<//g' | sort | uniq > k_all.html
...except they are redirects. For example, www.nfl.com/players/brandonweeden/profile?id=WEE221487 redirects to http://www.nfl.com/player/brandonweeden/2532970/profile
So we need to grab those. By requesting each of those pages, we are presented with a 302 redirect and a Location header. The Location is what we are interested in...
wget -i qb_all.html --max-redirect=0 2>&1 > /dev/null | grep Location | sed -e 's/Location: /www.nfl.com/; s/\ .//g; s/profile/gamelogs/' | sort | uniq > qb_final.txt wget -i rb_all.html --max-redirect=0 2>&1 > /dev/null | grep Location | sed -e 's/Location: /www.nfl.com/; s/\ .//g; s/profile/gamelogs/' | sort | uniq > rb_final.txt wget -i wr_all.html --max-redirect=0 2>&1 > /dev/null | grep Location | sed -e 's/Location: /www.nfl.com/; s/\ .//g; s/profile/gamelogs/' | sort | uniq > wr_final.txt wget -i te_all.html --max-redirect=0 2>&1 > /dev/null | grep Location | sed -e 's/Location: /www.nfl.com/; s/\ .//g; s/profile/gamelogs/' | sort | uniq > te_final.txt wget -i k_all.html --max-redirect=0 2>&1 > /dev/null | grep Location | sed -e 's/Location: /www.nfl.com/; s/\ .*//g; s/profile/gamelogs/' | sort | uniq > k_final.txt
Now we've got a direct link to all each player's stats. But we're not out of the woods yet. We need to pull all those down, and then parse those individual files.
I wrote a shell script that iterates over qb_final.txt and pulls down all the data files:
This does a pretty fantastic job. Remarkably, the NFL does not throttle or do any kind of gatekeeping. I was able to hammer their website with no protest. Thanks, NFL! (The only time you'll hear me say that.)
Now, we've got the raw HTML files, but we need this in some sort of usable format. I like Tab Separated Values, so we'll try to convert to that. The following is a very ugly glued-together script THAT WORKS. I'm not going for style point shere.
Now we have everything in TSVs. The next step is to get this all into a database so we can query it!
We'll be using SQLite since it's portable. Once in SQLite, you are free to ETL it into whatever dbms you want.
In [ ]:
In [1]:
## Storm ORM Objects
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import math
from os import listdir
from os.path import isfile, join
from os import walk
class Player(object):
__storm_table__ = "PLAYER"
PLAYER_ID = Int(primary=True)
FIRST_NAME = Unicode()
MIDDLE_NAME = Unicode()
LAST_NAME = Unicode()
NAME_SUFFIX = Unicode()
HEIGHT = Float()
WEIGHT = Float()
BIRTH_DATE = Date()
BIRTH_CITY = Unicode()
BIRTH_STATE = Unicode()
BIRTH_COUNTRY = Unicode()
HIGH_SCHOOL_NAME = Unicode()
HIGH_SCHOOL_CITY = Unicode()
HIGH_SCHOOL_STATE = Unicode()
COLLEGE = Unicode()
def __init__(self, player_id, first_name, middle_name, last_name, name_suffix, height, weight, birth_date, birth_city, birth_state, birth_country, high_school_name, high_school_city, high_school_state, college):
self.PLAYER_ID = int(player_id)
self.FIRST_NAME = u'' + first_name
self.MIDDLE_NAME = u'' + middle_name
self.LAST_NAME = u'' + last_name
self.NAME_SUFFIX = u'' + name_suffix
self.HEIGHT = float(height)
self.WEIGHT = float(weight)
if (birth_date != ''):
self.BIRTH_DATE = birth_date
self.BIRTH_CITY = u'' + birth_city
self.BIRTH_STATE = u'' + birth_state
self.BIRTH_COUNTRY = u'' + birth_country
self.HIGH_SCHOOL_NAME = u'' + high_school_name
self.HIGH_SCHOOL_CITY = u'' + high_school_city
self.HIGH_SCHOOL_STATE = u'' + high_school_state
self.COLLEGE = u'' + college
def __str__(self):
return 'Player[ PLAYER_ID = ' + str(player_id) + ', FIRST_NAME = ' + first_name + ', MIDDLE_NAME = ' + middle_name + ', LAST_NAME = ' + last_name + ', NAME_SUFFIX = ' + name_suffix + ', HEIGHT = ' + str(height) + ', WEIGHT = ' + str(weight) + ', BIRTH_DATE = ' + str(birth_date) + ', BIRTH_CITY = ' + birth_city + ', BIRTH_STATE = ' + birth_state + ', BIRTH_COUNTRY = ' + birth_country + ', HIGH_SCHOOL_NAME = ' + high_school_name + ', HIGH_SCHOOL_CITY = ' + high_school_city + ', HIGH_SCHOOL_STATE = ' + high_school_state + ', COLLEGE = ' + college + ' ]'
class Team(object):
__storm_table__ = "TEAM"
TEAM_ID = Unicode(primary=True)
TEAM_ABBRV = Unicode()
LONG_NAME = Unicode()
CONFERENCE = Unicode()
DIVISION = Unicode()
def __init__(self, team_id, team_abbrv, long_name, conference, division):
self.TEAM_ID = team_id
self.TEAM_ABBRV = team_abbrv
self.LONG_NAME = long_name
self.CONFERENCE = conference
self.DIVISION = division
def verbose_team_name(team_name):
if(team_name.lower() == 'Buccaneers') or (team_name.lower() == 'TB'):
return 'Tampa Bay Buccaneers'
if(team_name.lower() == 'Vikings') or (team_name.lower() == 'MIN'):
return 'Minnesota Vikings'
if(team_name.lower() == 'Jets') or (team_name.lower() == 'NYJ'):
return 'New York Jets'
if(team_name.lower() == 'Texans') or (team_name.lower() == 'HOU'):
return 'Houston Texans'
if(team_name.lower() == 'Bears') or (team_name.lower() == 'CHI'):
return 'Chicago Bears'
if(team_name.lower() == 'Cowboys') or (team_name.lower() == 'DAL'):
return 'Dallas Cowboys'
if(team_name.lower() == 'Buccaneers') or (team_name.lower() == 'TB'):
return 'Tampa Bay Buccaneers'
if(team_name.lower() == 'Buccaneers') or (team_name.lower() == 'TB'):
return 'Tampa Bay Buccaneers'
if(team_name.lower() == 'Bengals') or (team_name.lower() == 'CIN'):
return 'Cincinnatti Bengals'
if(team_name.lower() == '49ers') or (team_name.lower() == 'SF'):
return 'San Francisco 49ers'
if(team_name.lower() == 'Bills') or (team_name.lower() == 'BUF'):
return 'Buffalo Bills'
if(team_name.lower() == 'Broncos') or (team_name.lower() == 'DEN'):
return 'Denver Broncos'
if(team_name.lower() == 'Browns') or (team_name.lower() == 'CLE'):
return 'Cleveland Browns'
if(team_name.lower() == 'Cardinals') or (team_name.lower() == 'ARI'):
return 'Arizona Cardinals'
if(team_name.lower() == 'Chargers') or (team_name.lower() == 'SD'):
return 'San Diego Chargers'
if(team_name.lower() == 'Chiefs') or (team_name.lower() == 'KC'):
return 'Kansas City Chiefs'
if(team_name.lower() == 'Colts') or (team_name.lower() == 'IND'):
return 'Indianapolis Colts'
if(team_name.lower() == 'Dolphins') or (team_name.lower() == 'MIA'):
return 'Miami Dolphins'
if(team_name.lower() == 'Eagles') or (team_name.lower() == 'PHI'):
return 'Philadelphia Eagles'
if(team_name.lower() == 'Falcons') or (team_name.lower() == 'ATL'):
return 'Atlanta Falcons'
if(team_name.lower() == 'Giants') or (team_name.lower() == 'NYG'):
return 'New York Giants'
if(team_name.lower() == 'Jaguars') or (team_name.lower() == 'JAX'):
return 'Jacksonville Jaguars'
if(team_name.lower() == 'Lions') or (team_name.lower() == 'DET'):
return 'Detroit Lions'
if(team_name.lower() == 'Oilers') or (team_name.lower() == 'OIL'):
return 'Houston Oilers'
if(team_name.lower() == 'Packers') or (team_name.lower() == 'GB'):
return 'Green Bay Packers'
if(team_name.lower() == 'Panthers') or (team_name.lower() == 'CAR'):
return 'Carolina Panthers'
if(team_name.lower() == 'Patriots') or (team_name.lower() == 'NE'):
return 'New England Patriots'
if(team_name.lower() == 'Raiders') or (team_name.lower() == 'OAK'):
return 'Oakland Raiders'
if(team_name.lower() == 'Rams') or (team_name.lower() == 'STL'):
return 'St. Louis Rams'
if(team_name.lower() == 'Ravens') or (team_name.lower() == 'BAL'):
return 'Baltimore Ravens'
if(team_name.lower() == 'Redskins') or (team_name.lower() == 'WAS'):
return 'Washington Redskins'
if(team_name.lower() == 'Saints') or (team_name.lower() == 'NO'):
return 'New Orleans Saints'
if(team_name.lower() == 'Seahawks') or (team_name.lower() == 'SEA'):
return 'Seattle Seahawks'
if(team_name.lower() == 'Steelers') or (team_name.lower() == 'PIT'):
return 'Pittsburgh Steelers'
if(team_name.lower() == 'Titans') or (team_name.lower() == 'TEN'):
return 'Tennessee Titans'
def __str__(self):
return 'Team[ TEAM_ID = ' + str(self.TEAM_ID) + ', Team_ABBRV = ' + self.TEAM_ABBRV + ', LONG_NAME = ' + self.LONG_NAME + ', CONFERENCE = ' + self.CONFERENCE + ', DIVISION = ' + self.DIVISION + ']'
class Game(object):
__storm_table__ = "GAME"
GAME_ID = Int(primary=True)
SEASON = Unicode()
YEAR = Int()
WEEK = Unicode()
HOME_TEAM = Unicode()
VISITING_TEAM = Unicode()
HOME_TEAM_SCORE = Int()
VISITING_TEAM_SCORE = Int()
HOME_TEAM_RESULT = Unicode()
VISITING_TEAM_RESULT = Unicode()
GAME_DATE = Date()
DAY_OF_WEEK = Unicode()
POST_SEASON_TYPE = Unicode()
def __init__(self, game_id, season, year, week, home_team, visiting_team, home_team_score, visiting_team_score, home_team_result, visiting_team_result, game_date, day_of_week, post_season_type):
self.GAME_ID = int(game_id)
self.SEASON = u'' + season
self.YEAR = int(year)
# I screwed up and set week as a varchar
self.WEEK = u'' + str(week)
self.HOME_TEAM = u'' + home_team
self.VISITING_TEAM = u'' + visiting_team
if(isinstance(home_team_score, float) == False):
home_team_score = float(home_team_score)
if (math.isnan(home_team_score) == False):
self.HOME_TEAM_SCORE = int(home_team_score)
if(isinstance(visiting_team_score, float) == False):
visiting_team_score = float(visiting_team_score)
if (math.isnan(visiting_team_score) == False):
self.VISITING_TEAM_SCORE = int(visiting_team_score)
self.HOME_TEAM_RESULT = u'' + home_team_result
self.VISITING_TEAM_RESULT = u'' + visiting_team_result
if(game_date != ''):
self.GAME_DATE = game_date
self.DAY_OF_WEEK = u'' + day_of_week
if (post_season_type != ''):
self.POST_SEASON_TYPE = u'' + post_season_type
def __str__(self):
return 'Game[ GAME_ID = ' + str(self.GAME_ID) + ', SEASON = ' + str(self.SEASON) + ', YEAR = ' + str(self.YEAR) + ', WEEK = ' + str(self.WEEK) + ', HOME_TEAM = ' + str(self.HOME_TEAM) + ', AWAY_TEAM = ' + str(self.VISITING_TEAM) + ', HOME_TEAM_RESULT = ' + str(self.HOME_TEAM_RESULT) + ', VISITING_TEAM_RESULT = ' + str(self.VISITING_TEAM_RESULT) + ', GAME_DATE = ' + str(self.GAME_DATE) + ', DAY_OF_WEEK = ' + str(self.DAY_OF_WEEK) + ', POST_SEASON_TYPE = ' + str(self.POST_SEASON_TYPE) + ']'
class Roster(object):
__storm_table__ = "ROSTER"
ROSTER_ID = Int(primary=True)
TEAM_ID = Unicode()
GAME_ID = Int()
game = Reference(GAME_ID, Game.GAME_ID)
team = Reference(TEAM_ID, Team.TEAM_ID)
def __init__(self, roster_id, team_id, game_id):
self.ROSTER_ID = int(roster_id)
self.TEAM_ID = u'' + team_id
self.GAME_ID = int(game_id)
def __str__(self):
return 'Roster[ ROSTER_ID = ' + str(self.GAME_ID) + ', SEASON = ' + str(self.SEASON) + ', YEAR = ' + str(self.YEAR) + ', WEEK = ' + str(self.WEEK) + ', HOME_TEAM = ' + str(self.HOME_TEAM) + ', AWAY_TEAM = ' + str(self.VISITING_TEAM) + ', HOME_TEAM_RESULT = ' + str(self.HOME_TEAM_RESULT) + ', VISITING_TEAM_RESULT = ' + str(self.VISITING_TEAM_RESULT) + ', GAME_DATE = ' + str(self.GAME_DATE) + ', DAY_OF_WEEK = ' + str(self.DAY_OF_WEEK) + ', POST_SEASON_TYPE = ' + str(self.POST_SEASON_TYPE) + ']'
class Stats(object):
__storm_table__ = "PLAYER_STATS"
PLAYER_STATS_ID = Int(primary=True)
PLAYER_ID = Int()
ROSTER_ID = Int()
GAME_PLAYED = Unicode()
GAME_STARTED = Unicode()
PASS_COMPLETED = Int()
PASS_ATTEMPTED = Int()
PASS_PERCENTAGE = Float()
PASS_YARDS = Float()
PASS_AVERAGE_YARDS = Float()
PASS_TD = Int()
PASS_INT = Int()
PASS_SACK = Int()
PASS_SACK_YARDS = Float()
PASS_RATING = Float()
RUSH_ATTEMPTS = Int()
RUSH_YARDS = Float()
RUSH_AVERAGE = Float()
RUSH_LONG = Float()
RUSH_TD = Int()
FUMBLES = Int()
FUMBLES_LOST = Int()
RECEIVING_RECEPTIONS = Int()
RECEIVING_YARDS = Float()
RECEIVING_AVERAGE = Float()
RECEIVING_LONG = Float()
RECEIVING_TD = Int()
FG_BLOCKED = Int()
FG_LONG = Float()
FG_ATTEMPTS = Int()
FG_MADE = Int()
FG_PERCENT = Float()
XP_MADE = Int()
XP_ATTEMPTS = Int()
XP_PERCENT = Float()
XP_BLOCKED = Int()
KICKOFFS = Int()
KICKOFFS_AVERAGE = Float()
KICKOFFS_TOUCHBACKS = Int()
KICKOFFS_RETURNED = Int()
KICKOFFS_AVERAGE = Float()
RUSH_LONG_TD = Unicode()
RECEIVING_LONG_TD = Unicode()
KICKOFFS_RETURNED_AVERAGE = Float()
player = Reference(PLAYER_ID, Player.PLAYER_ID)
roster = Reference(ROSTER_ID, Roster.ROSTER_ID)
def __init__(self, player_stats_id, player_id, roster_id, game_result, game_played, game_started, pass_completed, pass_attempted, pass_percentage, pass_yards, pass_average_yards, pass_td, pass_int, pass_sack, pass_sack_yards, pass_rating, rush_attempts, rush_yards, rush_average, rush_long, rush_td, fumbles, fumbles_lost, receiving_receptions, receiving_yards, receiving_average, receiving_long, receiving_td, fg_blocked, fg_long, fg_attempts, fg_made, fg_percent, xp_made, xp_attempts, xp_percent, xp_blocked, kickoffs, kickoffs_average, kickoffs_touchbacks, kickoffs_returned, rush_long_td, receiving_long_td, kickoffs_returned_average):
self.PLAYER_STATS_ID = int(player_stats_id)
self.PLAYER_ID = int(player_id)
self.ROSTER_ID = int(roster_id)
self.GAME_PLAYED = u'' + game_played
self.GAME_STARTED = u'' + game_started
# Passing
self.PASS_COMPLETED = int(pass_completed)
self.PASS_ATTEMPTED = int(pass_attempted)
self.PASS_PERCENTAGE = float(pass_percentage)
self.PASS_YARDS = float(pass_yards)
self.PASS_AVERAGE_YARDS = float(pass_average_yards)
self.PASS_TD = int(pass_td)
self.PASS_INT = int(pass_int)
self.PASS_SACK = int(pass_sack)
self.PASS_SACK_YARDS = float(pass_sack_yards)
self.PASS_RATING = float(pass_rating)
# Rushing
self.RUSH_ATTEMPTS = int(rush_attempts)
self.RUSH_YARDS = float(rush_yards)
self.RUSH_AVERAGE = float(rush_average)
self.RUSH_LONG = float(rush_long)
self.RUSH_LONG_TD = u'' + rush_long_td
self.RUSH_TD = int(rush_td)
self.FUMBLES = int(fumbles)
self.FUMBLES_LOST = int(fumbles_lost)
# Receiving
self.RECEIVING_RECEPTIONS = int(receiving_receptions)
self.RECEIVING_YARDS = float(receiving_yards)
self.RECEIVING_AVERAGE = float(receiving_average)
self.RECEIVING_LONG = float(receiving_long)
self.RECEIVING_LONG_TD = u'' + receiving_long_td
self.RECEIVING_TD = int(receiving_td)
# Field Goals
self.FG_BLOCKED = int(fg_blocked)
self.FG_LONG = float(fg_long)
self.FG_ATTEMPTS = int(fg_attempts)
self.FG_MADE = int(fg_made)
self.FG_PERCENT = float(fg_percent)
# Extra Points
self.XP_MADE = int(xp_made)
self.XP_ATTEMPTS = int(xp_attempts)
self.XP_PERCENT = float(xp_percent)
self.XP_BLOCKED = int(xp_blocked)
# Kickoffs
self.KICKOFFS = int(kickoffs)
self.KICKOFFS_AVERAGE = float(kickoffs_average)
self.KICKOFFS_TOUCHBACKS = int(kickoffs_touchbacks)
self.KICKOFFS_RETURNED = int(kickoffs_returned)
self.KICKOFFS_RETURNED_AVERAGE = float(kickoffs_returned_average)
In [3]:
import datetime
from dateutil.parser import parse
mypath = '/home/bdetweiler/src/Data Science/Brad Evans Raw Data/NFL/players/'
#database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/db/nfl.db")
#store = Store(database)
for root, subdirs, files in os.walk(mypath):
for fileName in files:
first_name = ''
middle_name = ''
last_name = ''
name_suffix = ''
compact_name = ''
player_id = int(0)
height = float(0)
height_feet = float(0)
height_inches = float(0)
weight = float(0)
birth_day = int(0)
birth_month = int(0)
birth_year = int(0)
birth_date = ''
birth_city = ''
birth_state = ''
birth_country = ''
high_school_name = ''
high_school_city = ''
high_school_state = ''
college = ''
if 'table' not in fileName: continue
m = re.match("(^[a-z]+)_.*", fileName)
if m is not None:
compact_name = m.group(1)
m = re.match("^[a-z]+_([0-9]+).*", fileName)
if m is not None:
player_id = int(m.group(1))
with open(root + '/' + fileName, 'r') as infile:
count = 1
data = infile.read()
my_list = data.splitlines()
for line in my_list:
m = re.match("^Height: ([0-9]+)-([0-9]+)", line)
if m is not None:
height_feet = float(m.group(1))
height_inches = float(m.group(2))
height = float(height_feet) + float(height_inches / 12)
continue
m = re.match("^Weight: ([0-9]+)", line)
if m is not None:
weight = float(m.group(1))
continue
m = re.match("^Age: ([0-9]+)", line)
if m is not None:
# Don't need this
# age = float(m.group(1))
continue
m = re.match("^Born: ([0-9]+)/([0-9]+)/([0-9]+)", line)
if m is not None:
birth_month = int(m.group(1))
birth_day = int(m.group(2))
birth_year = int(m.group(3))
birth_month_str = str(birth_month)
birth_day_str = str(birth_day)
# Zero pad
if (len(birth_month_str) == 1):
birth_month_str = '0' + birth_month_str;
# Zero pad
if (len(birth_day_str) == 1):
birth_day_str = '0' + birth_day_str;
birth_date = datetime.datetime.strptime(birth_month_str + '/' + birth_day_str + '/' + str(birth_year), "%m/%d/%Y")
continue
m = re.match("^College: (.*)", line)
if m is not None:
college = m.group(1)
continue
m = re.match("^High School: (.*)", line)
if m is not None:
high_school_name = m.group(1)
continue
m = re.match("([A-Za-z .'-]+) ([A-Za-z .'-]+)", line)
if m is not None:
first_name = m.group(1).strip()
last_name = m.group(2).strip()
middle_name = ''
name_suffix = ''
continue
# store.add(Player(player_id, first_name, middle_name, last_name, name_suffix, height, weight, birth_date, birth_city, birth_state, birth_country, high_school_name, high_school_city, high_school_state, college))
#store.flush()
#store.commit()
In [ ]:
### Insert Games
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import datetime
from os import listdir
from os.path import isfile, join
from os import walk
from dateutil.parser import parse
mypath = '/home/bdetweiler/src/Data Science/Brad Evans Raw Data/NFL/games/'
game_id = 0
#database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/db/nfl.db")
#store = Store(database)
for root, subdirs, files in os.walk(mypath):
for fileName in files:
year = float('NaN')
week = float('NaN')
month = ''
day = float('NaN')
day_of_week = ''
game_date = ''
home_team = ''
away_team = ''
home_score = float('NaN')
away_score = float('NaN')
home_result = ''
away_result = ''
post_season_type = ''
season = 'REG'
pre = False
post = False
if '.txt' not in fileName: continue
m = re.match("^([1-2][0-9]*)_.*", fileName)
if m is not None:
year = m.group(1)
m = re.match("^[1-2][0-9]*_REG([0-9]+).*", fileName)
n = re.match("^[1-2][0-9]*_PRE([0-9]+).*", fileName)
if m is not None:
week = m.group(1)
elif n is not None:
week = n.group(1)
pre = True
season = 'PRE'
else:
post = True
season = 'POST'
# print(fileName)
# Go through each file
#with open(root + '/' + fileName, 'r') as infile:
# data = infile.read()
# my_list = data.splitlines()
# for line in my_list:
# game_id += 1
# m = re.match("(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)", line)
# if m is not None:
# print(line)
# day_of_week = m.group(1)
# m = re.match(".*(August|September|October|November|December|January|February)", line)
# if m is not None:
# month = m.group(1)
# m = re.match(".*([0-9][0-9])", line)
# if m is not None:
# day = m.group(1)
# game_date = datetime.datetime.strptime(month + ' ' + day + ', ' + year + ' ' + '00:00:00', "%B %d, %Y %H:%M:%S")
# continue;
# m = re.match("WildCardWeekend", line)
# if m is not None:
# post_season_type = 'Wild Card'
# continue;
# m = re.match("DivisionalPlayoffs", line)
# if m is not None:
# post_season_type = 'Divisional Playoffs'
# continue;
# m = re.match("ConferenceChampionships", line)
# if m is not None:
# post_season_type = 'Conference Championships'
# continue;
# m = re.match("ProBowl", line)
# if m is not None:
# post_season_type = 'Pro Bowl'
# continue;
# m = re.match("SuperBowl", line)
# if m is not None:
# post_season_type = 'Super Bowl'
# continue;
# m = re.match("((AFC)|(NFC)|[A-Z][a-z]+|49ers)-([0-9][0-9])-([0-9][0-9])-(AFC|NFC|[A-Z][a-z]+|49ers)", line)
# if m is not None:
# home_team = m.group(4)
# away_team = m.group(1)
# home_score = m.group(3)
# away_score = m.group(2)
# if (home_score > away_score):
# home_result = 'W'
# away_result = 'L'
# elif (home_score < away_score):
# home_result = 'L'
# away_result = 'W'
# else:
# home_result = 'T'
# away_result = 'T'
# game = Game(game_id, season, year, week, home_team, away_team, home_score, away_score, home_result, away_result, game_date, day_of_week, post_season_type)
# store.add(Game(game_id, season, year, week, home_team, away_team, home_score, away_score, home_result, away_result, game_date, day_of_week, post_season_type))
#game_date = ''
#home_team = ''
#away_team = ''
#home_score = float('NaN')
#away_score = float('NaN')
#post_season_type = ''
#pre = False
#post = False
#with open(root + '/' + fileName,'rb') as tsvin:
# tsvin = csv.reader(tsvin, delimiter='\t')
# m = re.match("([a-z]*)_([0-9]*)_(20[0-9][0-9])", fileName)
# if m is not None:
# player = m.group(1)
# player_id = m.group(2)
# year = m.group(3)
# WEEK GAME_DATE OPPONENT RESULT GAME_PLAYED GAME_STARTED PASS_COMPLETED PASS_ATTEMPTED PASS_PERCENTAGE PASS_YARDS PASS_AVERAGE_YARDS PASS_TD PASS_INT PASS_SACK PASS_SACK_YARDS PASS_RATING RUSH_ATTEMPTS RUSH_ATTEMPTS RUSH_AVERAGE RUSH_TD FUMBLES FUMBLES_LOST
# for row in tsvin:
# week = int(row[0])
# m = re.match("([0-9][0-9])/([0-9][0-9])", row[1])
# if m is not None:
# month = m.group(1)
# day = m.group(2)
# m = re.match("@", row[2])
# home_away = ''
# if m is not None:
# month = m.group(1)
# day = m.group(2)
#with open(root + '/' + fileName,'rb') as tsvin:
# tsvin = csv.reader(tsvin, delimiter='\t')
# m = re.match("([a-z]*)_([0-9]*)_(20[0-9][0-9])", fileName)
# if m is not None:
# player = m.group(1)
# player_id = m.group(2)
# year = m.group(3)
# WEEK GAME_DATE OPPONENT RESULT GAME_PLAYED GAME_STARTED PASS_COMPLETED PASS_ATTEMPTED PASS_PERCENTAGE PASS_YARDS PASS_AVERAGE_YARDS PASS_TD PASS_INT PASS_SACK PASS_SACK_YARDS PASS_RATING RUSH_ATTEMPTS RUSH_ATTEMPTS RUSH_AVERAGE RUSH_TD FUMBLES FUMBLES_LOST
# for row in tsvin:
# week = int(row[0])
# m = re.match("([0-9][0-9])/([0-9][0-9])", row[1])
# if m is not None:
# month = m.group(1)
# day = m.group(2)
# m = re.match("@", row[2])
# home_away = ''
# if m is not None:
# month = m.group(1)
# day = m.group(2)
#store.flush()
#store.commit()
In [4]:
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import datetime
from os import listdir
from os.path import isfile, join
from os import walk
from dateutil.parser import parse
#### Insert all rosters
#database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/db/nfl.db")
#store = Store(database)
#games = store.find(Game)
#roster_id = 0
#for game in games:
# roster_id += 1
# store.add(Roster(roster_id, game.HOME_TEAM, game.GAME_ID))
# roster_id += 1
# store.add(Roster(roster_id, game.VISITING_TEAM, game.GAME_ID))
#print('committing...')
#store.flush()
#store.commit()
In [ ]:
##### Insert QBs
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import datetime
from os import listdir
from os.path import isfile, join
from os import walk
from dateutil.parser import parse
mypath = '/home/bdetweiler/src/Data Science/Brad Evans Raw Data/NFL/QB/'
database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/nfl-database/nfl.db")
store = Store(database)
player_stats_id = 0
for root, subdirs, files in os.walk(mypath):
for fileName in files:
first_name = ''
middle_name = ''
last_name = ''
name_suffix = ''
compact_name = ''
player_id = int(0)
height = float(0)
height_feet = float(0)
height_inches = float(0)
weight = float(0)
birth_day = int(0)
birth_month = int(0)
birth_year = int(0)
birth_date = ''
birth_city = ''
birth_state = ''
birth_country = ''
high_school_name = ''
high_school_city = ''
high_school_state = ''
college = ''
if 'tsv' not in fileName: continue
if 'header.tsv' in fileName: continue
if 'tsv.sh' in fileName: continue
m = re.match("^[a-z'.-]+_([0-9]+)_([1-2][0-9]+).tsv", fileName)
if m is not None:
player_id = int(m.group(1))
year = int(m.group(2))
#print(str(player_id) + ' - ' + str(year))
with open(root + '/' + fileName, 'r') as tsvin:
count = 1
team = None
game = None
tsvin = csv.reader(tsvin, delimiter='\t')
print(fileName)
for line in tsvin:
#print('--------------------- ' + fileName + ' ---------------------------')
if (len(line) == 0): continue
#print(line)
opponent_team = None
player_stats_id += 1
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
rush_attempts = int(0)
rush_yards = float(0)
rush_average = float(0)
rush_long = float(0)
rush_td = int(0)
fumbles = int(0)
fumbles_lost = int(0)
receiving_receptions = int(0)
receiving_yards = float(0)
receiving_average = float(0)
receiving_long = float(0)
receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
week = int(line[0])
m = re.match("^([0-9][0-9])/([0-9][0-9])", line[1])
if m is not None:
month = m.group(1)
day = m.group(2)
game_date = datetime.datetime.strptime(str(month) + '/' + str(day) + '/' + str(year), "%m/%d/%Y")
away = False
opponent = ''
m = re.match("^@.*", line[2])
if m is not None:
away = True
n = re.match("^@(.*)", line[2])
if n is not None:
opponent = n.group(1)
else:
opponent = line[2]
#print ('opponent line[2] = ' + line[2])
if (opponent == 'JAC'):
opponent = 'JAX'
if (opponent == 'NPR'):
opponent = 'NFC'
if opponent_team is None:
#print('looking for opponent, ' + opponent)
opponent_team = store.find(Team, Team.TEAM_ABBRV == unicode(opponent)).one()
#print("select * from team where team_abbrv = '" + opponent + "';")
#test = store.execute("select * from team where team_abbrv = '" + opponent + "';").get_one()
#print(test)
#print(opponent_team)
else:
print(opponent)
print('could not get opponent team')
# Had trouble with dates, but this works, so...
if (opponent_team is not None):
#game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('opponent_team = ' + opponent_team.TEAM_ID + ', game_date = ' + str(game_date))
if (away):
#print('looking up game by date [' + str(game_date) + '] and home_team = [' + opponent_team.TEAM_ID + ']')
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date) + "' AND HOME_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('game (visitor) = ')
#print(game)
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
game_result = game.VISITING_TEAM_RESULT
else:
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date.strftime('%Y-%m-%d')) + "' AND VISITING_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
#print('looking up game by date [' + str(game_date) + '] and visiting_team = [' + opponent_team.TEAM_ID + ']')
# XXX: Not finding Colts, 2013-01-11
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.VISITING_TEAM == opponent_team.TEAM_ID)).one()
#print('game (home) = ')
#print(game)
#store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
#print(game)
#print('hometeam = ' + str(game.HOME_TEAM))
#team = store.execute("SELECT * FROM TEAM WHERE TEAM_ID = '" + str(game.HOME_TEAM) + "'").get_one()
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.HOME_TEAM).one()
game_result = game.HOME_TEAM_RESULT
if (line[4] != '--') and (line[4] != ''):
game_played = str(line[4])
if (line[5] != '--') and (line[5] != ''):
game_started = str(line[5])
if (line[6] != '--') and (line[6] != ''):
pass_completed = int(line[6])
if (line[7] != '--') and (line[7] != ''):
pass_attempted = int(line[7])
if (line[8] != '--') and (line[8] != ''):
pass_percentage = float(line[8])
if (line[9] != '--') and (line[9] != ''):
pass_yards = float(line[9])
if (line[10] != '--') and (line[10] != ''):
pass_average = float(line[10])
if (line[11] != '--') and (line[11] != ''):
pass_td = int(line[11])
if (line[12] != '--') and (line[12] != ''):
pass_interceptions = int(line[12])
if (line[13] != '--') and (line[13] != ''):
pass_sacks = int(line[13])
if (line[14] != '--') and (line[14] != ''):
pass_sack_yards = float(line[14])
if (line[15] != '--') and (line[15] != ''):
pass_rating = float(line[15])
if (line[16] != '--') and (line[16] != ''):
rush_attempts = int(line[16])
if (line[17] != '--') and (line[17] != ''):
rush_yards = float(line[17])
if (line[18] != '--') and (line[18] != ''):
rush_average = float(line[18])
if (line[19] != '--') and (line[19] != ''):
rush_td = int(line[19])
if (line[20] != '--') and (line[20] != ''):
rush_fum = int(line[20])
if (line[21] != '--') and (line[21] != ''):
rush_fum_lost = int(line[21])
#pass_completed = int(0)
#pass_attempted = int(0)
#pass_percentage = float(0)
#pass_yards = float(0)
#pass_average_yards = float(0)
##pass_td = int(0)
#pass_int = int(0)
#pass_sack = int(0)
#pass_sack_yards = float(0)
#pass_rating = float(0)
#rush_attempts = int(0)
#rush_yards = float(0)
#rush_average = float(0)
#rush_long = float(0)
#rush_td = int(0)
#fumbles = int(0)
#fumbles_lost = int(0)
receiving_receptions = int(0)
receiving_yards = float(0)
receiving_average = float(0)
receiving_long = float(0)
receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
#roster = store.find(Roster, And(Team.TEAM_ID == team.TEAM_ID, Game.GAME_ID == game.GAME_ID)).one()
roster = store.find(Roster, And(Roster.TEAM_ID == team.TEAM_ID, Roster.GAME_ID == game.GAME_ID)).one()
#for roster in rosters:
# print('for team_id ' + team.TEAM_ID + ' and game_id = ' + str(game.GAME_ID))
if (roster is None):
print('roster not found for TEAM ' + team.TEAM_ID + ' and game ' + game.GAME_ID)
stats = Stats(player_stats_id, player_id, roster.ROSTER_ID, game_result, u'' + game_played, game_started, pass_completed, pass_attempted, pass_percentage, pass_yards, pass_average_yards, pass_td, pass_int, pass_sack, pass_sack_yards, pass_rating, rush_attempts, rush_yards, rush_average, rush_long, rush_td, fumbles, fumbles_lost, receiving_receptions, receiving_yards, receiving_average, receiving_long, receiving_td, fg_blocked, fg_long, fg_attempts, fg_made, fg_percent, xp_made, xp_attempts, xp_percent, xp_blocked, kickoffs, kickoffs_average, kickoffs_touchbacks, kickoffs_returned, rush_long_td, receiving_long_td)
#store.add(stats)
#m = re.match("^Height: ([0-9]+)-([0-9]+)", line)
#if m is not None:
# height_feet = float(m.group(1))
# height_inches = float(m.group(2))
# height = float(height_feet) + float(height_inches / 12)
# continue
# store.add(Player(player_id, first_name, middle_name, last_name, name_suffix, height, weight, birth_date, birth_city, birth_state, birth_country, high_school_name, high_school_city, high_school_state, college))
print('committing...')
#store.flush()
#store.commit()
Note: Terrell Pryor is marked as a WR on the NFL website, even though he played QB for a while.
In [ ]:
### Insert WRs
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import datetime
from os import listdir
from os.path import isfile, join
from os import walk
from dateutil.parser import parse
mypath = '/home/bdetweiler/src/Data Science/Brad Evans Raw Data/NFL/WR/'
#database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/nfl-database/nfl.db")
store = Store(database)
player_stats_id = 10793
for root, subdirs, files in os.walk(mypath):
for fileName in files:
first_name = ''
middle_name = ''
last_name = ''
name_suffix = ''
compact_name = ''
player_id = int(0)
height = float(0)
height_feet = float(0)
height_inches = float(0)
weight = float(0)
birth_day = int(0)
birth_month = int(0)
birth_year = int(0)
birth_date = ''
birth_city = ''
birth_state = ''
birth_country = ''
high_school_name = ''
high_school_city = ''
high_school_state = ''
college = ''
if 'tsv' not in fileName: continue
if 'header.tsv' in fileName: continue
if 'tsv.sh' in fileName: continue
m = re.match("^[a-z'.-]+_([0-9]+)_([1-2][0-9]+).tsv", fileName)
if m is not None:
player_id = int(m.group(1))
year = int(m.group(2))
#print(str(player_id) + ' - ' + str(year))
with open(root + '/' + fileName, 'r') as tsvin:
count = 1
team = None
game = None
tsvin = csv.reader(tsvin, delimiter='\t')
print(fileName)
for line in tsvin:
#print('--------------------- ' + fileName + ' ---------------------------')
if (len(line) == 0): continue
#print(line)
opponent_team = None
player_stats_id += 1
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
rush_attempts = int(0)
rush_yards = float(0)
rush_average = float(0)
rush_long = float(0)
rush_long_td = str('0')
rush_td = int(0)
fumbles = int(0)
fumbles_lost = int(0)
receiving_receptions = int(0)
receiving_yards = float(0)
receiving_average = float(0)
receiving_long = float(0)
receiving_long_td = str('0')
receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
week = int(line[0])
m = re.match("^([0-9][0-9])/([0-9][0-9])", line[1])
if m is not None:
month = m.group(1)
day = m.group(2)
game_date = datetime.datetime.strptime(str(month) + '/' + str(day) + '/' + str(year), "%m/%d/%Y")
away = False
opponent = ''
m = re.match("^@.*", line[2])
if m is not None:
away = True
n = re.match("^@(.*)", line[2])
if n is not None:
opponent = n.group(1)
else:
opponent = line[2]
#print ('opponent line[2] = ' + line[2])
if (opponent == 'JAC'):
opponent = 'JAX'
if (opponent == 'NPR'):
opponent = 'NFC'
if opponent_team is None:
#print('looking for opponent, ' + opponent)
opponent_team = store.find(Team, Team.TEAM_ABBRV == unicode(opponent)).one()
#print("select * from team where team_abbrv = '" + opponent + "';")
#test = store.execute("select * from team where team_abbrv = '" + opponent + "';").get_one()
#print(test)
#print(opponent_team)
else:
print(opponent)
print('could not get opponent team')
# Had trouble with dates, but this works, so...
if (opponent_team is not None):
#game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('opponent_team = ' + opponent_team.TEAM_ID + ', game_date = ' + str(game_date))
if (away):
#print('looking up game by date [' + str(game_date) + '] and home_team = [' + opponent_team.TEAM_ID + ']')
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date) + "' AND HOME_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('game (visitor) = ')
#print(game)
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
game_result = game.VISITING_TEAM_RESULT
else:
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date.strftime('%Y-%m-%d')) + "' AND VISITING_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
#print('looking up game by date [' + str(game_date) + '] and visiting_team = [' + opponent_team.TEAM_ID + ']')
# XXX: Not finding Colts, 2013-01-11
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.VISITING_TEAM == opponent_team.TEAM_ID)).one()
#print('game (home) = ')
#print(game)
#store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
#print(game)
#print('hometeam = ' + str(game.HOME_TEAM))
#team = store.execute("SELECT * FROM TEAM WHERE TEAM_ID = '" + str(game.HOME_TEAM) + "'").get_one()
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.HOME_TEAM).one()
game_result = game.HOME_TEAM_RESULT
if (line[4] != '--') and (line[4] != ''):
game_played = str(line[4])
if (line[5] != '--') and (line[5] != ''):
game_started = str(line[5])
if (line[6] != '--') and (line[6] != ''):
receiving_receptions = int(line[6])
if (line[7] != '--') and (line[7] != ''):
receiving_yards = float(line[7])
if (line[8] != '--') and (line[8] != ''):
receiving_average = float(line[8])
# If there's a T following the digits, it means they got a touchdown on their longest reception
if (line[9] != '--') and (line[9] != ''):
m = re.match("^([0-9]+)T", line[9])
if m is not None:
receiving_long = float(m.group(1))
receiving_long_td = '1'
else:
receiving_long = float(line[9])
if (line[10] != '--') and (line[10] != ''):
receiving_td = int(line[10])
if (line[11] != '--') and (line[11] != ''):
rush_attempts = int(line[11])
if (line[12] != '--') and (line[12] != ''):
rush_yards = float(line[12])
if (line[13] != '--') and (line[13] != ''):
rush_average = float(line[13])
# If there's a T following the digits, it means they got a touchdown on their longest run
if (line[14] != '--') and (line[14] != ''):
m = re.match("^([0-9]+)T", line[14])
if m is not None:
rush_long = float(m.group(1))
rush_long_td = '1'
else:
rush_long = float(line[14])
if (line[15] != '--') and (line[15] != ''):
fumbles = int(line[15])
if (line[16] != '--') and (line[16] != ''):
fumbles_lost = int(line[16])
#if (line[17] != '--') and (line[17] != ''):
# rush_yards = float(line[17])
#if (line[18] != '--') and (line[18] != ''):
# rush_average = float(line[18])
#if (line[19] != '--') and (line[19] != ''):
# rush_td = int(line[19])
#if (line[20] != '--') and (line[20] != ''):
# rush_fum = int(line[20])
#if (line[21] != '--') and (line[21] != ''):
# rush_fum_lost = int(line[21])
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
#rush_attempts = int(0)
#rush_yards = float(0)
#rush_average = float(0)
#rush_long = float(0)
#rush_long_td = str('0')
#rush_td = int(0)
#fumbles = int(0)
#fumbles_lost = int(0)
#receiving_receptions = int(0)
#receiving_yards = float(0)
#receiving_average = float(0)
#receiving_long = float(0)
#receiving_long_td = str('0')
#receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
#roster = store.find(Roster, And(Team.TEAM_ID == team.TEAM_ID, Game.GAME_ID == game.GAME_ID)).one()
roster = store.find(Roster, And(Roster.TEAM_ID == team.TEAM_ID, Roster.GAME_ID == game.GAME_ID)).one()
#for roster in rosters:
# print('for team_id ' + team.TEAM_ID + ' and game_id = ' + str(game.GAME_ID))
if (roster is None):
print('roster not found for TEAM ' + team.TEAM_ID + ' and game ' + game.GAME_ID)
stats = Stats(player_stats_id, player_id, roster.ROSTER_ID, game_result, u'' + game_played, game_started, pass_completed, pass_attempted, pass_percentage, pass_yards, pass_average_yards, pass_td, pass_int, pass_sack, pass_sack_yards, pass_rating, rush_attempts, rush_yards, rush_average, rush_long, rush_td, fumbles, fumbles_lost, receiving_receptions, receiving_yards, receiving_average, receiving_long, receiving_td, fg_blocked, fg_long, fg_attempts, fg_made, fg_percent, xp_made, xp_attempts, xp_percent, xp_blocked, kickoffs, kickoffs_average, kickoffs_touchbacks, kickoffs_returned, rush_long_td, receiving_long_td)
store.add(stats)
print('committing...')
#store.flush()
#store.rollback()
#store.commit()
Ran into a few issues with WRs. Joe Webb was placed in this directory, and he is listed as a QB. The longs (receiving, rushing) columns can have 'T' in them,
In [ ]:
### Insert RBs
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import datetime
from os import listdir
from os.path import isfile, join
from os import walk
from dateutil.parser import parse
mypath = '/home/bdetweiler/src/Data Science/Brad Evans Raw Data/NFL/RB/'
database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/nfl-database/nfl.db")
store = Store(database)
player_stats_id = 36325
for root, subdirs, files in os.walk(mypath):
for fileName in files:
first_name = ''
middle_name = ''
last_name = ''
name_suffix = ''
compact_name = ''
player_id = int(0)
height = float(0)
height_feet = float(0)
height_inches = float(0)
weight = float(0)
birth_day = int(0)
birth_month = int(0)
birth_year = int(0)
birth_date = ''
birth_city = ''
birth_state = ''
birth_country = ''
high_school_name = ''
high_school_city = ''
high_school_state = ''
college = ''
if 'tsv' not in fileName: continue
if 'header.tsv' in fileName: continue
if 'tsv.sh' in fileName: continue
m = re.match("^[a-z'.-]+_([0-9]+)_([1-2][0-9]+).tsv", fileName)
if m is not None:
player_id = int(m.group(1))
year = int(m.group(2))
#print(str(player_id) + ' - ' + str(year))
with open(root + '/' + fileName, 'r') as tsvin:
count = 1
team = None
game = None
tsvin = csv.reader(tsvin, delimiter='\t')
print(fileName)
for line in tsvin:
#print('--------------------- ' + fileName + ' ---------------------------')
if (len(line) == 0): continue
#print(line)
opponent_team = None
player_stats_id += 1
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
rush_attempts = int(0)
rush_yards = float(0)
rush_average = float(0)
rush_long = float(0)
rush_long_td = str('0')
rush_td = int(0)
fumbles = int(0)
fumbles_lost = int(0)
receiving_receptions = int(0)
receiving_yards = float(0)
receiving_average = float(0)
receiving_long = float(0)
receiving_long_td = str('0')
receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
week = int(line[0])
m = re.match("^([0-9][0-9])/([0-9][0-9])", line[1])
if m is not None:
month = m.group(1)
day = m.group(2)
game_date = datetime.datetime.strptime(str(month) + '/' + str(day) + '/' + str(year), "%m/%d/%Y")
away = False
opponent = ''
m = re.match("^@.*", line[2])
if m is not None:
away = True
n = re.match("^@(.*)", line[2])
if n is not None:
opponent = n.group(1)
else:
opponent = line[2]
#print ('opponent line[2] = ' + line[2])
if (opponent == 'JAC'):
opponent = 'JAX'
if (opponent == 'NPR'):
opponent = 'NFC'
if opponent_team is None:
#print('looking for opponent, ' + opponent)
opponent_team = store.find(Team, Team.TEAM_ABBRV == unicode(opponent)).one()
#print("select * from team where team_abbrv = '" + opponent + "';")
#test = store.execute("select * from team where team_abbrv = '" + opponent + "';").get_one()
#print(test)
#print(opponent_team)
else:
print(opponent)
print('could not get opponent team')
# Had trouble with dates, but this works, so...
if (opponent_team is not None):
#game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('opponent_team = ' + opponent_team.TEAM_ID + ', game_date = ' + str(game_date))
if (away):
#print('looking up game by date [' + str(game_date) + '] and home_team = [' + opponent_team.TEAM_ID + ']')
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date) + "' AND HOME_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('game (visitor) = ')
#print(game)
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
game_result = game.VISITING_TEAM_RESULT
else:
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date.strftime('%Y-%m-%d')) + "' AND VISITING_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
#print('looking up game by date [' + str(game_date) + '] and visiting_team = [' + opponent_team.TEAM_ID + ']')
# XXX: Not finding Colts, 2013-01-11
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.VISITING_TEAM == opponent_team.TEAM_ID)).one()
#print('game (home) = ')
#print(game)
#store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
#print(game)
#print('hometeam = ' + str(game.HOME_TEAM))
#team = store.execute("SELECT * FROM TEAM WHERE TEAM_ID = '" + str(game.HOME_TEAM) + "'").get_one()
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.HOME_TEAM).one()
game_result = game.HOME_TEAM_RESULT
if (line[4] != '--') and (line[4] != ''):
game_played = str(line[4])
if (line[5] != '--') and (line[5] != ''):
game_started = str(line[5])
if (line[6] != '--') and (line[6] != ''):
receiving_receptions = int(line[6])
if (line[7] != '--') and (line[7] != ''):
receiving_yards = float(line[7])
if (line[8] != '--') and (line[8] != ''):
receiving_average = float(line[8])
# If there's a T following the digits, it means they got a touchdown on their longest reception
if (line[9] != '--') and (line[9] != ''):
m = re.match("^([0-9]+)T", line[9])
if m is not None:
receiving_long = float(m.group(1))
receiving_long_td = '1'
else:
receiving_long = float(line[9])
if (line[10] != '--') and (line[10] != ''):
receiving_td = int(line[10])
if (line[11] != '--') and (line[11] != ''):
rush_attempts = int(line[11])
if (line[12] != '--') and (line[12] != ''):
rush_yards = float(line[12])
if (line[13] != '--') and (line[13] != ''):
rush_average = float(line[13])
# If there's a T following the digits, it means they got a touchdown on their longest run
if (line[14] != '--') and (line[14] != ''):
m = re.match("^([0-9]+)T", line[14])
if m is not None:
rush_long = float(m.group(1))
rush_long_td = '1'
else:
rush_long = float(line[14])
if (line[15] != '--') and (line[15] != ''):
fumbles = int(line[15])
if (line[16] != '--') and (line[16] != ''):
fumbles_lost = int(line[16])
#if (line[17] != '--') and (line[17] != ''):
# rush_yards = float(line[17])
#if (line[18] != '--') and (line[18] != ''):
# rush_average = float(line[18])
#if (line[19] != '--') and (line[19] != ''):
# rush_td = int(line[19])
#if (line[20] != '--') and (line[20] != ''):
# rush_fum = int(line[20])
#if (line[21] != '--') and (line[21] != ''):
# rush_fum_lost = int(line[21])
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
#rush_attempts = int(0)
#rush_yards = float(0)
#rush_average = float(0)
#rush_long = float(0)
#rush_long_td = str('0')
#rush_td = int(0)
#fumbles = int(0)
#fumbles_lost = int(0)
#receiving_receptions = int(0)
#receiving_yards = float(0)
#receiving_average = float(0)
#receiving_long = float(0)
#receiving_long_td = str('0')
#receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
#roster = store.find(Roster, And(Team.TEAM_ID == team.TEAM_ID, Game.GAME_ID == game.GAME_ID)).one()
roster = store.find(Roster, And(Roster.TEAM_ID == team.TEAM_ID, Roster.GAME_ID == game.GAME_ID)).one()
#for roster in rosters:
# print('for team_id ' + team.TEAM_ID + ' and game_id = ' + str(game.GAME_ID))
if (roster is None):
print('roster not found for TEAM ' + team.TEAM_ID + ' and game ' + game.GAME_ID)
stats = Stats(player_stats_id, player_id, roster.ROSTER_ID, game_result, u'' + game_played, game_started, pass_completed, pass_attempted, pass_percentage, pass_yards, pass_average_yards, pass_td, pass_int, pass_sack, pass_sack_yards, pass_rating, rush_attempts, rush_yards, rush_average, rush_long, rush_td, fumbles, fumbles_lost, receiving_receptions, receiving_yards, receiving_average, receiving_long, receiving_td, fg_blocked, fg_long, fg_attempts, fg_made, fg_percent, xp_made, xp_attempts, xp_percent, xp_blocked, kickoffs, kickoffs_average, kickoffs_touchbacks, kickoffs_returned, rush_long_td, receiving_long_td)
store.add(stats)
print('committing...')
#store.flush()
#store.rollback()
#store.commit()
In [ ]:
### Insert TEs
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import datetime
from os import listdir
from os.path import isfile, join
from os import walk
from dateutil.parser import parse
mypath = '/home/bdetweiler/src/Data Science/Brad Evans Raw Data/NFL/TE/'
#database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/nfl-database/nfl.db")
store = Store(database)
player_stats_id = 56174
for root, subdirs, files in os.walk(mypath):
for fileName in files:
first_name = ''
middle_name = ''
last_name = ''
name_suffix = ''
compact_name = ''
player_id = int(0)
height = float(0)
height_feet = float(0)
height_inches = float(0)
weight = float(0)
birth_day = int(0)
birth_month = int(0)
birth_year = int(0)
birth_date = ''
birth_city = ''
birth_state = ''
birth_country = ''
high_school_name = ''
high_school_city = ''
high_school_state = ''
college = ''
if 'tsv' not in fileName: continue
if 'header.tsv' in fileName: continue
if 'tsv.sh' in fileName: continue
m = re.match("^[a-z'.-]+_([0-9]+)_([1-2][0-9]+).tsv", fileName)
if m is not None:
player_id = int(m.group(1))
year = int(m.group(2))
#print(str(player_id) + ' - ' + str(year))
with open(root + '/' + fileName, 'r') as tsvin:
count = 1
team = None
game = None
tsvin = csv.reader(tsvin, delimiter='\t')
print(fileName)
for line in tsvin:
#print('--------------------- ' + fileName + ' ---------------------------')
if (len(line) == 0): continue
#print(line)
opponent_team = None
player_stats_id += 1
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
rush_attempts = int(0)
rush_yards = float(0)
rush_average = float(0)
rush_long = float(0)
rush_long_td = str('0')
rush_td = int(0)
fumbles = int(0)
fumbles_lost = int(0)
receiving_receptions = int(0)
receiving_yards = float(0)
receiving_average = float(0)
receiving_long = float(0)
receiving_long_td = str('0')
receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
week = int(line[0])
m = re.match("^([0-9][0-9])/([0-9][0-9])", line[1])
if m is not None:
month = m.group(1)
day = m.group(2)
game_date = datetime.datetime.strptime(str(month) + '/' + str(day) + '/' + str(year), "%m/%d/%Y")
away = False
opponent = ''
m = re.match("^@.*", line[2])
if m is not None:
away = True
n = re.match("^@(.*)", line[2])
if n is not None:
opponent = n.group(1)
else:
opponent = line[2]
#print ('opponent line[2] = ' + line[2])
if (opponent == 'JAC'):
opponent = 'JAX'
if (opponent == 'NPR'):
opponent = 'NFC'
# print('game_date = ' + str(game_date) + ' opponent = ' + opponent)
if opponent_team is None:
#print('looking for opponent, ' + opponent)
opponent_team = store.find(Team, Team.TEAM_ABBRV == unicode(opponent)).one()
#print("select * from team where team_abbrv = '" + opponent + "';")
#test = store.execute("select * from team where team_abbrv = '" + opponent + "';").get_one()
#print(test)
#print(opponent_team)
else:
print(opponent)
print('could not get opponent team')
# Had trouble with dates, but this works, so...
if (opponent_team is not None):
#game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('opponent_team = ' + opponent_team.TEAM_ID + ', game_date = ' + str(game_date))
if (away):
#print('looking up game by date [' + str(game_date) + '] and home_team = [' + opponent_team.TEAM_ID + ']')
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date) + "' AND HOME_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('game (visitor) = ')
#print(game)
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
game_result = game.VISITING_TEAM_RESULT
else:
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date.strftime('%Y-%m-%d')) + "' AND VISITING_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
#print('looking up game by date [' + str(game_date) + '] and visiting_team = [' + opponent_team.TEAM_ID + ']')
# XXX: Not finding Colts, 2013-01-11
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.VISITING_TEAM == opponent_team.TEAM_ID)).one()
#print('game (home) = ')
#print(game)
#store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
#print(game)
#print('hometeam = ' + str(game.HOME_TEAM))
#team = store.execute("SELECT * FROM TEAM WHERE TEAM_ID = '" + str(game.HOME_TEAM) + "'").get_one()
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.HOME_TEAM).one()
game_result = game.HOME_TEAM_RESULT
if (line[4] != '--') and (line[4] != ''):
game_played = str(line[4])
if (line[5] != '--') and (line[5] != ''):
game_started = str(line[5])
if (line[6] != '--') and (line[6] != ''):
receiving_receptions = int(line[6])
if (line[7] != '--') and (line[7] != ''):
receiving_yards = float(line[7])
if (line[8] != '--') and (line[8] != ''):
receiving_average = float(line[8])
# If there's a T following the digits, it means they got a touchdown on their longest reception
if (line[9] != '--') and (line[9] != ''):
m = re.match("^([0-9]+)T", line[9])
if m is not None:
receiving_long = float(m.group(1))
receiving_long_td = '1'
else:
receiving_long = float(line[9])
if (line[10] != '--') and (line[10] != ''):
receiving_td = int(line[10])
if (line[11] != '--') and (line[11] != ''):
rush_attempts = int(line[11])
if (line[12] != '--') and (line[12] != ''):
rush_yards = float(line[12])
if (line[13] != '--') and (line[13] != ''):
rush_average = float(line[13])
# If there's a T following the digits, it means they got a touchdown on their longest run
if (line[14] != '--') and (line[14] != ''):
m = re.match("^([0-9]+)T", line[14])
if m is not None:
rush_long = float(m.group(1))
rush_long_td = '1'
else:
rush_long = float(line[14])
if (line[15] != '--') and (line[15] != ''):
fumbles = int(line[15])
if (line[16] != '--') and (line[16] != ''):
fumbles_lost = int(line[16])
#if (line[17] != '--') and (line[17] != ''):
# rush_yards = float(line[17])
#if (line[18] != '--') and (line[18] != ''):
# rush_average = float(line[18])
#if (line[19] != '--') and (line[19] != ''):
# rush_td = int(line[19])
#if (line[20] != '--') and (line[20] != ''):
# rush_fum = int(line[20])
#if (line[21] != '--') and (line[21] != ''):
# rush_fum_lost = int(line[21])
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
#rush_attempts = int(0)
#rush_yards = float(0)
#rush_average = float(0)
#rush_long = float(0)
#rush_long_td = str('0')
#rush_td = int(0)
#fumbles = int(0)
#fumbles_lost = int(0)
#receiving_receptions = int(0)
#receiving_yards = float(0)
#receiving_average = float(0)
#receiving_long = float(0)
#receiving_long_td = str('0')
#receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
#roster = store.find(Roster, And(Team.TEAM_ID == team.TEAM_ID, Game.GAME_ID == game.GAME_ID)).one()
roster = store.find(Roster, And(Roster.TEAM_ID == team.TEAM_ID, Roster.GAME_ID == game.GAME_ID)).one()
#for roster in rosters:
# print('for team_id ' + team.TEAM_ID + ' and game_id = ' + str(game.GAME_ID))
if (roster is None):
print('roster not found for TEAM ' + team.TEAM_ID + ' and game ' + game.GAME_ID)
stats = Stats(player_stats_id, player_id, roster.ROSTER_ID, game_result, u'' + game_played, game_started, pass_completed, pass_attempted, pass_percentage, pass_yards, pass_average_yards, pass_td, pass_int, pass_sack, pass_sack_yards, pass_rating, rush_attempts, rush_yards, rush_average, rush_long, rush_td, fumbles, fumbles_lost, receiving_receptions, receiving_yards, receiving_average, receiving_long, receiving_td, fg_blocked, fg_long, fg_attempts, fg_made, fg_percent, xp_made, xp_attempts, xp_percent, xp_blocked, kickoffs, kickoffs_average, kickoffs_touchbacks, kickoffs_returned, rush_long_td, receiving_long_td)
store.add(stats)
print('committing...')
#store.flush()
#store.rollback()
#store.commit()
In [ ]:
### Insert Ks
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import datetime
from os import listdir
from os.path import isfile, join
from os import walk
from dateutil.parser import parse
mypath = '/home/bdetweiler/src/Data Science/Brad Evans Raw Data/NFL/K/'
database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/nfl-database/nfl.db")
store = Store(database)
player_stats_id = 70393
for root, subdirs, files in os.walk(mypath):
for fileName in files:
first_name = ''
middle_name = ''
last_name = ''
name_suffix = ''
compact_name = ''
player_id = int(0)
height = float(0)
height_feet = float(0)
height_inches = float(0)
weight = float(0)
birth_day = int(0)
birth_month = int(0)
birth_year = int(0)
birth_date = ''
birth_city = ''
birth_state = ''
birth_country = ''
high_school_name = ''
high_school_city = ''
high_school_state = ''
college = ''
if 'tsv' not in fileName: continue
if 'header.tsv' in fileName: continue
if 'tsv.sh' in fileName: continue
m = re.match("^[a-z'.-]+_([0-9]+)_([1-2][0-9]+).tsv", fileName)
if m is not None:
player_id = int(m.group(1))
year = int(m.group(2))
#print(str(player_id) + ' - ' + str(year))
with open(root + '/' + fileName, 'r') as tsvin:
count = 1
team = None
game = None
tsvin = csv.reader(tsvin, delimiter='\t')
print(fileName)
for line in tsvin:
#print('--------------------- ' + fileName + ' ---------------------------')
if (len(line) == 0): continue
#print(line)
opponent_team = None
player_stats_id += 1
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
rush_attempts = int(0)
rush_yards = float(0)
rush_average = float(0)
rush_long = float(0)
rush_long_td = str('0')
rush_td = int(0)
fumbles = int(0)
fumbles_lost = int(0)
receiving_receptions = int(0)
receiving_yards = float(0)
receiving_average = float(0)
receiving_long = float(0)
receiving_long_td = str('0')
receiving_td = int(0)
fg_blocked = int(0)
fg_long = float(0)
fg_attempts = int(0)
fg_made = int(0)
fg_percent = float(0)
xp_made = int(0)
xp_attempts = int(0)
xp_percent = float(0)
xp_blocked = int(0)
kickoffs = int(0)
kickoffs_average = float(0)
kickoffs_touchbacks = int(0)
kickoffs_returned = int(0)
week = int(line[0])
kickoffs_returned_average = float(0)
m = re.match("^([0-9][0-9])/([0-9][0-9])", line[1])
if m is not None:
month = m.group(1)
day = m.group(2)
game_date = datetime.datetime.strptime(str(month) + '/' + str(day) + '/' + str(year), "%m/%d/%Y")
away = False
opponent = ''
m = re.match("^@.*", line[2])
if m is not None:
away = True
n = re.match("^@(.*)", line[2])
if n is not None:
opponent = n.group(1)
else:
opponent = line[2]
#print ('opponent line[2] = ' + line[2])
if (opponent == 'JAC'):
opponent = 'JAX'
if (opponent == 'NPR'):
opponent = 'NFC'
# print('game_date = ' + str(game_date) + ' opponent = ' + opponent)
if opponent_team is None:
#print('looking for opponent, ' + opponent)
opponent_team = store.find(Team, Team.TEAM_ABBRV == unicode(opponent)).one()
#print("select * from team where team_abbrv = '" + opponent + "';")
#test = store.execute("select * from team where team_abbrv = '" + opponent + "';").get_one()
#print(test)
#print(opponent_team)
else:
print(opponent)
print('could not get opponent team')
# Had trouble with dates, but this works, so...
if (opponent_team is not None):
#game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('opponent_team = ' + opponent_team.TEAM_ID + ', game_date = ' + str(game_date))
if (away):
#print('looking up game by date [' + str(game_date) + '] and home_team = [' + opponent_team.TEAM_ID + ']')
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date) + "' AND HOME_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.HOME_TEAM == opponent_team.TEAM_ID)).one()
#print('game (visitor) = ')
#print(game)
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
game_result = game.VISITING_TEAM_RESULT
else:
#my_game = store.execute("SELECT * FROM GAME WHERE GAME_DATE = '" + str(game_date.strftime('%Y-%m-%d')) + "' AND VISITING_TEAM = '" + str(opponent_team.TEAM_ID) + "'").get_one()
#print('my_game = ' + str(my_game))
#print('looking up game by date [' + str(game_date) + '] and visiting_team = [' + opponent_team.TEAM_ID + ']')
# XXX: Not finding Colts, 2013-01-11
game = store.find(Game, And(Game.GAME_DATE == game_date, Game.VISITING_TEAM == opponent_team.TEAM_ID)).one()
#print('game (home) = ')
#print(game)
#store.find(Team, Team.TEAM_ID == u'' + game.VISITING_TEAM).one()
#print(game)
#print('hometeam = ' + str(game.HOME_TEAM))
#team = store.execute("SELECT * FROM TEAM WHERE TEAM_ID = '" + str(game.HOME_TEAM) + "'").get_one()
#print(team)
team = store.find(Team, Team.TEAM_ID == u'' + game.HOME_TEAM).one()
game_result = game.HOME_TEAM_RESULT
if (line[4] != '--') and (line[4] != ''):
game_played = str(line[4])
if (line[5] != '--') and (line[5] != ''):
game_started = str(line[5])
if (line[6] != '--') and (line[6] != ''):
fg_blocked = int(line[6])
if (line[7] != '--') and (line[7] != ''):
fg_long = float(line[7])
if (line[8] != '--') and (line[8] != ''):
fg_attempts = int(line[8])
if (line[9] != '--') and (line[9] != ''):
fg_made = int(line[9])
if (line[10] != '--') and (line[10] != ''):
fg_percent = float(line[10])
if (line[11] != '--') and (line[11] != ''):
xp_made = int(line[11])
if (line[12] != '--') and (line[12] != ''):
xp_attempts = int(line[12])
if (line[13] != '--') and (line[13] != ''):
xp_percent = float(line[13])
if (line[14] != '--') and (line[14] != ''):
xp_blocked = int(line[14])
if (line[15] != '--') and (line[15] != ''):
kickoffs = int(line[15])
if (line[16] != '--') and (line[16] != ''):
kickoffs_average = float(line[16])
if (line[17] != '--') and (line[17] != ''):
kickoffs_touchbacks = int(line[17])
if (line[18] != '--') and (line[18] != ''):
kickoffs_returned = int(line[18])
if (line[19] != '--') and (line[19] != ''):
kickoffs_returned_average = float(line[19])
#if (line[20] != '--') and (line[20] != ''):
# rush_fum = int(line[20])
#if (line[21] != '--') and (line[21] != ''):
# rush_fum_lost = int(line[21])
pass_completed = int(0)
pass_attempted = int(0)
pass_percentage = float(0)
pass_yards = float(0)
pass_average_yards = float(0)
pass_td = int(0)
pass_int = int(0)
pass_sack = int(0)
pass_sack_yards = float(0)
pass_rating = float(0)
rush_attempts = int(0)
rush_yards = float(0)
rush_average = float(0)
rush_long = float(0)
rush_long_td = str('0')
rush_td = int(0)
fumbles = int(0)
fumbles_lost = int(0)
receiving_receptions = int(0)
receiving_yards = float(0)
receiving_average = float(0)
receiving_long = float(0)
receiving_long_td = str('0')
receiving_td = int(0)
#fg_blocked = int(0)
#fg_long = float(0)
#fg_attempts = int(0)
#fg_made = int(0)
#fg_percent = float(0)
#xp_made = int(0)
#xp_attempts = int(0)
#xp_percent = float(0)
#xp_blocked = int(0)
#kickoffs = int(0)
#kickoffs_average = float(0)
#kickoffs_touchbacks = int(0)
#kickoffs_returned = int(0)
roster = store.find(Roster, And(Roster.TEAM_ID == team.TEAM_ID, Roster.GAME_ID == game.GAME_ID)).one()
#for roster in rosters:
# print('for team_id ' + team.TEAM_ID + ' and game_id = ' + str(game.GAME_ID))
if (roster is None):
print('roster not found for TEAM ' + team.TEAM_ID + ' and game ' + game.GAME_ID)
#stats = Stats(player_stats_id, player_id, roster.ROSTER_ID, game_result, u'' + game_played, game_started, pass_completed, pass_attempted, pass_percentage, pass_yards, pass_average_yards, pass_td, pass_int, pass_sack, pass_sack_yards, pass_rating, rush_attempts, rush_yards, rush_average, rush_long, rush_td, fumbles, fumbles_lost, receiving_receptions, receiving_yards, receiving_average, receiving_long, receiving_td, fg_blocked, fg_long, fg_attempts, fg_made, fg_percent, xp_made, xp_attempts, xp_percent, xp_blocked, kickoffs, kickoffs_average, kickoffs_touchbacks, kickoffs_returned, rush_long_td, receiving_long_td, kickoffs_returned_average)
#store.add(stats)
print('committing...')
#store.flush()
#store.rollback()
#store.commit()
In [2]:
# Trimming LAST_NAME
from storm.locals import *
import csv
import re
from pprint import pprint
import os
import datetime
from os import listdir
from os.path import isfile, join
from os import walk
from dateutil.parser import parse
mypath = '/home/bdetweiler/src/Data Science/Brad Evans Raw Data/NFL/K/'
database = create_database("sqlite:/home/bdetweiler/src/Data Science/brad-evans-analysis/nfl-database/nfl.db")
store = Store(database)
players = store.find(Player)
for player in players:
store.find(Player, Player.LAST_NAME == player.LAST_NAME).set(LAST_NAME = player.LAST_NAME.strip())
#player.LAST_NAME = player.LAST_NAME.strip()
print(player.LAST_NAME + '^')
print('committing...')
#store.flush()
#store.rollback()
#store.commit()
In [ ]: